Preparations

Notebook extensions


In [ ]:
%%javascript
require(['base/js/utils'],
function(utils) {
    utils.load_extensions('IPython-notebook-extensions-3.x/usability/comment-uncomment');
    utils.load_extensions('IPython-notebook-extensions-3.x/usability/dragdrop/main');
});

In [ ]:
%load_ext autoreload
%autoreload 2

Imports and configuration

We set the path to the config.cfg file using the environment variable 'PYPMJ_CONFIG_FILE'. If you do not have a configuration file yet, please look into the Setting up a configuration file example.


In [ ]:
import os
os.environ['PYPMJ_CONFIG_FILE'] = '/path/to/your/config.cfg'

Now we can import pypmj and numpy. Since the parent directory, which contains the pypmj module, is not automatically in our path, we need to append it before.


In [ ]:
import sys
sys.path.append('..')
import pypmj as jpy
import numpy as np

On import, we receive information on the configured logging and JCMsuite version, if the logging level is appropriate. We can get additional info on the version and the license.


In [ ]:
jpy.jcm_license_info()

The versions of JCMsuite and pypmj can be accessed by the module attributes __jcm_version__ and __version__


In [ ]:
print 'Version of JCMsuite:', jpy.__jcm_version__
print 'Version of pypmj:', jpy.__version__

Extensions

You can get a list of available extensions to pypmj using the extensions module attribute.


In [ ]:
jpy.extensions

Extensions may have additional dependencies or may need data that is not shipped with pypmj. You can load an extension using the load_extension function.


In [ ]:
jpy.load_extension('materials')

Simulation

Preparing and configuring the simulation set

We start by creating a JCMProject-instance describing the project we'd like to run. The mie2D-project is located in a subdirectory of our project catalog, which is configured in the section Data under key projects in the configuration file. We could also specify an absolute path instead. Since we want to leave the actual project untouched, we specify a working_dir into which the project is copied before. JCMgeo and the template conversion will be executed in the working directory, for example. If we do not specify a working_dir, a folder called current_run will be used in the current directory.

Note: If you did not configure the project directory shipped with pypmj in the configuration, simply specify the absolute path to the mie2D-project here.


In [ ]:
wdir = os.path.abspath('working_dir')
project = jpy.JCMProject('scattering/mie/mie2D', working_dir=wdir)

The JCMProject-instance automatically detected the name of the project file:


In [ ]:
project.project_file_name

If it fails to find a proper file or if it finds multiple project files, it raises an Exception. You can specify the project file name manually using the parameter project_file_name on initialization.

To run simulations using this project, we create a SimulationSet (this could also be a single simulation). The keys that are necessary to translate the JCM template files (i.e. the .jcmp(t)-files) need to be given as a nested dict with keys constants, parameters and geometry in the top level. The values for these keys need to be dicts as well, providing all necessary keys for the template translation in total. Their function is as follows:

  • constants: can be of any type, but are not stored in the HDF5 store. This is useful for minor parameters, such as the info level in the project, as it does not change the result of the simulation. But it can also be used to pass complicated data, such as material data maps.
  • parameters: All parameters that do not change the geometry, i.e. do not belong to the layout.jcmt template.
  • geometry: All parameters that do change the geometry, i.e. belong to the layout.jcmt template.

If a sequence is provided for any of the parameters or geometry values, loops will be performed (depending on the combination_mode of the SimulationSet).

In the mie2D project, there is only one parameter: the radius of the circle. This parameter changes the geometry! We'd like to scan over different radii and, consequently, provide a numpy.array for it. We leave the other two dicts empty.


In [ ]:
mie_keys = {'constants' :{},
            'parameters': {},
            'geometry': {'radius':np.linspace(0.3, 0.5, 40)}}

Now, the SimulationSet can be initialized. For now, we also set the storage_base and storage_folder attributes. This will ignore the storage base folder of your configuration and use the folder 'tmp_storage_folder' in the current working directory. And instead of a folder named by the current date, the subfolder will be called 'mie2D_test'.


In [ ]:
simuset = jpy.SimulationSet(project, mie_keys, 
                            storage_folder='mie2D_test',
                            storage_base=os.path.abspath('tmp_storage_folder'))

We are now informed about the directory in which our data is stored as configured in the configuration file and by the duplicate_path_levels parameter. The path is also stored in the attribute storage_dir. It now contains an .h5 database file:


In [ ]:
os.listdir(simuset.storage_dir)

We can now make a schedule for the simulations that we want to perform. This includes that

  1. all parameter combinations are determinded,
  2. the simulations are sorted in a way to have minimal calls of JCMgeo (which can be expensive) and
  3. the database is checked for matching simulations which already have been done.

In our case, the database is still empty and we should end up with 40 simulations, as we specified 40 different radii.


In [ ]:
simuset.make_simulation_schedule()

The store should still be empty at this time:


In [ ]:
simuset.is_store_empty()

Depending on the configured servers, there might be multiple workstations or queues which can be used by the JCMdaemon to run the simulations. For this 2D project, we can restrict the resources to be only the local computer, i.e. 'localhost', and use only 2 workers.


In [ ]:
simuset.use_only_resources('localhost')
simuset.resource_manager.resources.set_m_n_for_all(2,1)

You can get ResourceDict of the currently configured resources that will be used by the SimulationSet using the get_current_resources-method or simply using the resources attribute of the resource_manager.


In [ ]:
simuset.get_current_resources()

Computing geometries only

Computing a geometry (i.e. running jcm.geo) for a specific simulation of our SimulationSet is an easy task now. We only need to call the compute_geometry method with the index of the simulation (or the Simulation instance itself). We can pass additional keyword arguments to jcm.geo, such as a jcmt_pattern if desired.

Let's compute the geometry for the first simulation of our set. The simulation has the following properties:


In [ ]:
sim = simuset.simulations[0]
print sim
print 'keys:', sim.keys

Now we run jcm.geo for it (we also could have written simuset.compute_geometry(0))


In [ ]:
simuset.compute_geometry(0)

The project's working directory now contains a grid file:


In [ ]:
os.listdir(simuset.get_project_wdir())

Running a single simulation

Before demonstrating how to solve the complete set of simulations, we show how to solve a single simulation using JCMsolve. This can be very useful if the simulation is still being developed or if something needs to be recomputed later. We can solve a specific simulation using the solve_single_simulation method by passing the simulation number or the Simulation-instance. It automatically computes the geometry (if compute_geometry is True) and adds the resources if necessary.


In [ ]:
print 'Status before solving:', sim.status
results, logs = simuset.solve_single_simulation(sim)
print 'Status after solving:', sim.status

All the results and logs are also stored as attributes of the Simulation-instance as logs, error_message, exit_code and jcm_results, e.g.


In [ ]:
print sim.logs['Out']

The fieldbag file path is also set as an attribute:


In [ ]:
sim.fieldbag_file

Developing and using a processing function

So far we only have solved the simulation by (1) computing the geometry and (2) running JCMsolve. This is fine and it may be that no further step is desired. However, yet nothing is set up to be saved to the HDF5 store, so nothing but the input parameters will appear in it. As a minimal step, we may want to save the computational costs of our simulations. But normally, post processes will be part of your JCM project file and you may want to (3) extract information from them or even derive quantities from them by further processing.

If all you want to do is store the computational cost, this is automatically done by calling the process_results method of the Simulation-instance without further input arguments.


In [ ]:
sim.process_results()

If the logging-level is set to 'DEBUG', you see the message 'No result processing was done.' to inform you that only the computational costs have been read out. The results that will be stored are now described by the (hidden) attribute _results_dict:


In [ ]:
sim._results_dict

The status of the simulation is now updated.


In [ ]:
sim.status

To execute further processings, we can use the processing_func-argument. From the docs:

The `processing_func` must be a function of one or two input arguments.
A list of all results returned by post processes in JCMsolve are passed
as the first argument to this function. If a second input  argument is 
present, it must be called 'keys'. Then, the simulation keys are passed
(i.e. self.keys). This is useful to use parameters of the simulation,
e.g. the wavelength, inside your processing function. It must return a
dict with key-value pairs that should be saved to the HDF5 store. 
Consequently, the values must be of types that can be stored to HDF5, 
otherwise Exceptions will occur in the saving steps.

In the mie2D-project a FluxIntegration is used as a PostProcess-section in mie2D.jcmp. It's results will be passed to the processing_func. As this is the only post process, the length of the list passed to the processing_func will be 1. We can see the list using


In [ ]:
sim.jcm_results[1:]

We'd like to store the real part of the ElectromagneticFieldEnergyFlux of the zeroth source, as this is the scattering cross section (SCS) of our Mie-scatterer. The function processing_func will simply look like this:


In [ ]:
def read_scs(pp):
    results = {} #must be a dict
    results['SCS'] = pp[0]['ElectromagneticFieldEnergyFlux'][0][0].real
    return results

We can try it out by running process_results results again, this time passing our custom function. We need to set overwrite to True to force the processing to be executed again.


In [ ]:
sim.process_results(processing_func=read_scs, overwrite=True)

We now have an additional key called 'SCS' in our _results_dict. It will be stored in the HDF5 store.


In [ ]:
sim._results_dict['SCS']

You can get the complete data of the simulation including input parameters and results as a pandas DataFrame:


In [ ]:
sim._get_DataFrame()

Running all simulations

Finally, we can run all our simulations and process them using our custom processing_func. This is done in parallel using all the resources that we have added. The results will be appended to the HDF5 store.


In [ ]:
simuset.run(N=10, processing_func=read_scs)

In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_context('notebook')

data = simuset.get_store_data().sort_values(by='radius')
data.plot(x='radius', y='SCS', title='Results of the simulation')

Adding data in a later run

Let's assume we want to extend our store in a later session. The automatic data comparison will detect that some of the data is already known and will only execute the missing simulations. The same mechanism helps to continue the SimulationSet on the point were it stopped, e.g. due to a keyboard interruption or a server error.

For demonstration, we extend our list of radii and reinitialize the SimulationSet. To show the power of the comparison mechanism we provide the radii in a mixed up order.


In [ ]:
extended_radii = np.append(np.linspace(0.5, 0.6, 40)[1:], np.linspace(0.3, 0.5, 40))
mie_keys_extended = {'constants' :{},
                     'parameters': {},
                     'geometry': {'radius':extended_radii}}

We close the store, delete the SimulationSet instance and start all over.


In [ ]:
simuset.close_store()
del simuset

In [ ]:
simuset = jpy.SimulationSet(project, mie_keys_extended)
simuset.make_simulation_schedule()

We are now informed that matches have been found in the HDF5 store. The residual simulations can be executed as before. This time, we set the wdir_mode to 'zip' to demonstrate the automatic zipping and deletion of the working directories. If the directories are no longer needed, you can set wdir_mode to 'delete'. Also, we demonstrate the save running of SimulationSets by using the utility function run_simusets_in_save_mode, which is tolerant to unforseen errors and restarts the SimulationSet in such cases. It also sends status e-mails if this is configured in the configuration file.


In [ ]:
jpy.utils.run_simusets_in_save_mode(simuset, N=10, processing_func=read_scs, wdir_mode='zip')

The storage folder now only contains the HDF5 store and a zip-archive with all the working directories.


In [ ]:
os.listdir(simuset.storage_dir)

We can now plot our extended results.


In [ ]:
simuset.get_store_data().sort_values(by='radius').plot(x='radius', y='SCS', title='Results of the simulation')

We can also write our data to a CSV or an Excel file using the write_store_data_to_file method:


In [ ]:
simuset.write_store_data_to_file() # default is results.csv in the storage folder
simuset.write_store_data_to_file(os.path.join(simuset.storage_dir, 'results_excel.xls'), mode='Excel')